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Abstract — A cross-layer design along with an optimal resource 
allocation framework is formulated for wireless fading networks, 
where the nodes are allowed to perform network coding. The aim 
is to jointly optimize end-to-end transport layer rates, network 
code design variables, broadcast link flows, link capacities, aver- 
age power consumption, and short-term power allocation policies. 
As in the routing paradigm where nodes simply forward packets, 
the cross-layer optimization problem with network coding is 
non-convex in general. It is proved however, that with network 
coding, dual decomposition for multicast is optimal so long as 
the fading at each wireless link is a continuous random variable. 
This lends itself to provably convergent subgradient algorithms, 
which not only admit a layered-architecture interpretation but 
also optimally integrate network coding in the protocol stack. 
The dual algorithm is also paired with a scheme that yields near- 
optimal network design variables, namely multicast end-to-end 
rates, network code design quantities, flows over the broadcast 
links, link capacities, and average power consumption. Finally, 
an asynchronous subgradient method is developed, whereby the 
dual updates at the physical layer can be affordably performed 
with a certain delay with respect to the resource allocation 
tasks in upper layers. This attractive feature is motivated by the 
complexity of the physical layer subproblem, and is an adaptation 
of the subgradient method suitable for network control. 

Index Terms — Network coding, cross-layer designs, optimiza- 
tion methods, asynchronous subgradient methods, multihop. 



I. Introduction 

Traditional networks have always assumed nodes capable 
of only forwarding or replicating packets. For many types 
of networks however, this constraint is not inherently needed 
since the nodes can invariably perform encoding functions. 
Interestingly, even simple linear mixing operations can be 
powerful enough to enhance the network throughput, minimize 
delay, and decrease the overall power consumption IT], ||2l- 
For the special case of single-source multicast, which does 
not even admit a polynomial-time solution within the routing 
framework fS), linear network coding achieves the full network 
capacity ||4|. In fact, the network flow description of multicast 
with random network coding adheres to only linear inequality 
constraints reminiscent of the corresponding description in 
unicast routing lO. 

This encourages the use of network coding to extend several 
popular results in unicast routing framework to multicast with- 
out appreciable increase in complexity. Of particular interest 
is the resource allocation and cross-layer optimization task in 
wireless networks ||6l, Q. The objective here is to maximize 
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a network utility function subject to flow, rate, capacity and 
power constraints. This popular approach not only offers the 
flexibility of capturing diverse performance objectives, but 
also admits a layering interpretation, arising from different 
decompositions of the optimization problem [S]. 

This paper deals with cross-layer optimization of wireless 
multicast networks that use network coding and operate over 
fading links. The aim is to maximize a total network utility 
objective, and entails finding end-to-end rates, network code 
design variables, broadcast link flows, link capacities, average 
power consumption, and instantaneous power allocations. 

Network utility maximization was first brought into coded 
networks in JS], where the aim was to minimize a generic cost 
function subject only to flow and rate constraints. The optimal 
flow and rate variables may then be converted to a practical 
random network coding implementation using methods from 
|[9] and ifTOl . Subsequent works extended this framework 
to include power, capacity, and scheduling constraints [|TT|- 
[[l4l . The interaction of network coding with the network and 
transport layers has also been explored in lfT5l - lfT9l ; in these 
works, networks with fixed link capacities are studied, and 
different decomposition techniques result in different types of 
layered architectures. 

There are however caveats associated with the utility max- 
imization problem in wireless networks. First, the power 
control and scheduling subproblems are usually non-convex. 
This implies that the dual decomposition of the overall prob- 
lem, though insightful, is not necessarily optimal and does 
not directly result in a feasible primal solution. Second, for 
continuous fading channels, determining the power control 
policy is an infinite dimensional problem. Existing approaches 
in network coding consider either deterministic channels IfTTI . 
|[T4l . or, links with a finite number of fading states lfT2l . 1201 . 

mi 

On the other hand, a recent result in unicast routing shows 
that albeit the non-convexity, the overall utility optimization 
problem has no duality gap for wireless networks with contin- 
uous fading channels 1221 . As this is indeed the case in all real- 
life fading environments, the result promises the optimality of 
layer separation. In particular, it renders a dual subgradient 
descent algorithm for network design optimal J23]. 

The present paper begins with a formulation that jointly 
optimizes end-to-end rates, virtual flows, broadcast link flows, 
link capacities, average power consumption, and instantaneous 
power allocations in wireless fading multicast networks that 
use intra-session network coding (Section HHi. The first contri- 
bution of this paper is to introduce a realistic physical layer 
model formulation accounting for the capacity of broadcast 
links. The cross-layer problem is generally non-convex, yet it 
is shown to have zero duality gap (Section [III- Al l. This result 
considerably broadens ll22l to coded multicast networks with 
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broadcast links. The zero duality gap is then leveraged in order 
to develop a subgradient descent algorithm that minimizes the 
dual function (Sections IIII-BI IIII-Cl l. The algorithm admits a 
natural layering interpretation, allowing optimal integration of 
network coding into the protocol stack. 

In Section HV] the subgradient algorithm is modified so that 
the component of the subgradient that results from the physical 
layer power allocation may be delayed with respect to opera- 
tions in other layers. This provably convergent asynchronous 
subgradient method and its online implementation constitute 
the second major contribution. Unlike the algorithm in 1231 . 
which is used for offline network optimization, the algorithm 
developed here is suitable for online network control. Con- 
vergence of asynchronous subgradient methods for dual mini- 
mization is known under diminishing stepsize 1241 : the present 
paper proves results for constant stepsize. Near-optimal primal 
variables are also recovered by forming running averages of 
the primal iterates. This technique has also been used in 
synchronous subgradient methods for convex optimization; see 
e.g., l25l and references therein. Here, ergodic convergence 
results are established for the asynchronous scheme and the 
non-convex problem at hand. Finally, numerical results are 
presented in Section [V] and Section [Vl] concludes the paper. 

II. Problem Formulation 

Consider a wireless network consisting of a set of terminals 
(nodes) denoted by M. The broadcast property of the wireless 
interface is modeled by using the concept of hyperarcs. A 
hype rare is a pair (i, J) that represents a broadcast link from 
a node i to a chosen set of nodes J C Af. The entire network 
can therefore be represented as a hypergraph H = {J\f, A), 
where A is the set of hyperarcs. The complexity of the 
model is determined by the choice of the set A. Let the 
neighbor-set N{i) denote the set of nodes that node i reaches. 
An exhaustive model mig ht include all possible 2l^(*)l - 1 
hyperarcs from node i. On the other hand, a simpler model 
might include only a smaller number of hyperarcs per node. 
A point-to-point model is also a special case when node i has 
\N{i)\ hyperarcs each containing just one receiver. 

The present work considers a physical layer whereby the 
channels undergo random multipath fading. This model allows 
for opportunistically best schedules per channel realization. 
This is different from the link-level network models in jS], 
|[T2l . ifTsl . l2n . where the hyperarcs are modeled as erasure 
channels. The next subsection discusses the physical layer 
model in detail. 



A. Physieal Layer 

In the current setting, terminals are assumed to have a set of 
tones T available for transmission. Let hi- denote the power 
gain between nodes i and j over a tone j ^ T, assumed 
random, capturing fading effects. Let h represent the vector 
formed by stacking all the channel gains. The network operates 
in a time slotted fashion; the channel h remains constant for 
the duration of a slot, but is allowed to change from slot to slot. 
A slowly fading channel is assumed so that a large number of 
packets may be transmitted per time slot. The fading process 
is modeled to be stationary and ergodic. 



Since the channel changes randomly per time slot, the 
optimization variables at the physical layer are the channel 
realization-specific power allocations p{j(h) for all hyperarcs 
(i, J) e A, and tones f E For convenience, these 
power allocations are stacked in a vector p(h). Instantaneous 
power allocations may adhere to several scheduling and mask 
constraints, and these will be generically denoted by a bounded 
set n such that p(h) e 11. The long-term average power 
consumption by a node i is given by 



(1) 



where E[.] denotes expectation over the stationary channel 
distribution. 

For slow fading channels, the information-theoretic capacity 
of a hyperarc {i, J) is defined as the maximum rate at which 
all nodes in J receive data from i with vanishing probability 
of error in a given time slot. This capacity depends on 
the instantaneous power allocations p(h) and channels h. 
A generic bounded function C/j(p(h),h) will be used to 
describe this mapping. Next we give two examples of the 
functional forms of C/j(-) and 11. 

Example 1. Conflict graph model: The power allocations pjj 
adhere to the spectral mask constraints 



0<p{j< P\ 



max 



(2) 



However, only conflict-free hyperarc are allowed to be sched- 
uled for a given h. Specifically, power may be allocated to 
hyperarcs (ii, Ji) and (i2, J2) if and only if lT3l 

i) «i 7^ 12; 

ii) ii ^ J2 and 12 ^ Ji (half-duplex operation); and 
iii-a) Ji n J2 = (primary interference), or additionally, 
iii-b) Ji n N{i2) = J2 H N(ii) — (secondary interference). 
The set II therefore consists of all possible power allocations 
that satisfy the previous properties. 

Due to hyperarc scheduling, all transmissions in the network 
are interference free. The signal-to-noise ratio (SNR) at a node 
j G J is given by 



f 



4ji 



(3) 



where Nj is the noise power at j. In a broadcast setting, the 
maximum rate of information transfer from i to each node in 
J is 



Cf,(p(h),h)=minlog(l 



r{,,(p(h),h)). (4) 



ijj 



A similar expression can be written for the special case of 
point-to-point links by substituting hyperarcs {i, J) by arcs 
in the expression for F^j^ (p(h), h). 
For slow-fading channels, Gaussian codebooks with suf- 
ficiently large block lengths achieve this capacity in every 
time slot. More realistically, an SNR penalty term p can 
be included to account for finite-length practical codes and 
adaptive modulation schemes, so that 



C/,(p(h),h) 



min log 



i + r{,.(p(h),h)/p). (5) 



- iJj 



The penalty term is in general a function of the target bit error 
rate. 
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Example 2. Signal-to-interference-plus-noise-ratio (SINR) 
model: Here, the constraint set 11 is simply a box set Bp, 

n = Bp := {p{j\0 < p{j < V (z, J) e ^ and / e ^ }. 

(6) 

The set Bp could also include (instantaneous) sum-power 
constraints per node. The capacity is expressed as in (HI or 
(|5]l, but now the SNR is replaced by the SINR, given by 

r{,^ (p(h), h) = p{j(h)h{^/ {N, + i^^^ + i^f + i^j^f) . 

(7) 

The denominator consists of the following terms; 

• Interference from other nodes' transmissions to node j 



7-int 



E 

{k.M)eA:jeAI. 



(8a) 



"Self-interference" due to transmissions of node j 

^-^n E P^A^(h)- (8b) 

J\/:(j,A/)e^ 

This term is introduced to encourage half-duplex opera- 
tion by setting hjj to a large value. 
"Broadcast-interference" from transmissions of node i to 
other hyperarcs 



TDioaa _ o/, 



E PfM(h). 



(8c) 



A/:(i,A/)e^ 



This term is introduced to force node i to transmit at most 
over a single hyperarc, by setting /3 to a large value. 
The previous definitions ignore interference from non- 
neighboring nodes. However, they can be readily extended to 
include more general interference models. 

The link layer capacity is defined as the long-term average 
of the total instantaneous capacity, namely. 



^C£,(p(h),h) 



/ 



(9) 



This is also called ergodic capacity and represents the maxi- 
mum average data rate available to the link layer. 

B. Link Layer and Above 

The network supports multiple multicast sessions indexed 
by TO, namely Sm ■= (s™, T™, a™), each associated with 
a source node s™, sink nodes T"' C M, and an average 
flow rate a™ from s'" to each t £ T™. The value a™ is the 
average rate at which the network layer of source terminal s"* 
admits packets from the transport layer Traffic is considered 
elastic, so that the packets do not have any short-term delay 
constraints. 

Network coding is a generalization of routing since the 
nodes are allowed to code packets together rather than simply 
forward them. This paper considers intra-session network 
coding, where only the traffic belonging to the same multicast 
session is allowed to mix. Although better than routing in 
general, this approach is still suboptimal in terms of achieving 
the network capacity. However, general (inter-session) network 



coding is difficult to characterize or implement since neither 
the capacity region nor efficient network code designs are 
known HI Part II]. On the other hand, a simple linear coding 
strategy achieves the full capacity region of intra-session 
network coding ||4l. 

The network layer consists of endogenous flows of coded 
packets over hyperarcs. Recall that the maximum average rate 
of transmission over a single hyperarc cannot exceed Cij. Let 
the coded packet-rate of a multicast session to over hyperarc 
{i, J) be (also referred to as the subgraph or broadcast 
link flow). The link capacity constraints thus translate to 



z^<cu V(z,J)eA 



(10) 



To describe the intra-session network coding capacity re- 
gion, it is commonplace to use the concept of virtual flow 
between terminals i and j corresponding to each session 
to and sink t e T"' with average rate a;™*. These virtual 
flows are defined only for neighboring pairs of nodes i.e., 
[hi] ^ G ■■= {{i,j)\{i,J) e A,j e J}. The virtual flows 
satisfy the flow-conservation constraints, namely, 

{a™ if i = s", 
-a™ ifi = t, 
otherwise 

(11) 

for all m, t <E T™, and i G A/". Hereafter, the set of equations 
for i ^ t will be omitted because they are implied by the 
remaining equations. 

The broadcast flows z") and the virtual flows a;"'* can be 
related using results from the lossy-hyperarc model of 15], 
[TTl. Specifically, |fT3l eq. (9)] relates the virtual flows and 
subgraphs, using the fraction bijK G [0, 1] of packets injected 
into the hyperarc (i, J) that reach the set of nodes K C N{i). 
Recall from Section Hl-AI that here the instantaneous capacity 
function Clj{-) is defined such that all packets injected into 
the hyperarc (i, J) are received by every node in J. Thus in 
our case, huK = 1 whenever A' n J 7^ and consequently, 



E< 



< 



E 

J:{t,J)eA 



VK C N{i),i eM^m^teT" 



(12) 

Note the difference with ifTSl where at every time slot, 
packets are injected into a fixed set of hyperarcs at the same 
rate. The problem in [131 is therefore to find a schedule of 
hyperarcs that do not interfere (the non-conflicting hyperarcs). 
The same schedule is used at every time slot; however, only a 
random subset of nodes receive the injected packets in a given 
slot. Instead here, the hyperarc selection is part of the power 
allocation problem at the physical layer, and is done for every 
time slot. The transmission rate (or equivalently, the channel 
coding redundancy) is however appropriately adjusted so that 
all the nodes in the selected hyperarc receive the data. 

In general, for any feasible solution to the set of equations 
(fT0li-(fT2]i. a network code exists that supports the correspond- 
ing exogenous rates a™ Q. This is because for each multicast 
session to, the maximum flow between s™ and t G T™ is 
a™, and is therefore achievable ||4l Th. 1]. Given a feasible 
solution, various network coding schemes can be used to 
achieve the exogenous rates. Random network coding based 



IEEE/ACM TRANSACTIONS ON NETWORKING (ACCEPTED) 



4 



implementations such as those proposed in Q and IfTOll . 
are particularly attractive since they are fully distributed and 
require little overhead. These schemes also handle any residual 
errors or erasures that remain due to the physical layer. 

The system model also allows for a set of "box constraints" 
that limit the long-term powers, transport layer rates, broadcast 
link flow rates, virtual flow rates as well as the maximum link 
capacities. Combined with the set H, these constraints can be 
compactly expressed as 

S:={y,p(h)| p(h) en, 0<p, <pr", 

<in < «™ < <ax, < ca < , 

0<2,D<^a^ 0<x™*<x--}. (13) 

Here y is a super-vector formed by stacking all the average 
rate and power variables, that is, a™, z™j, a;™*, qj, and pi. 
Parameters with min/max subscripts or superscripts denote 
prescribed lower/upper bounds on the corresponding variables. 



packets and stored at nodes in queues (and virtual queues for 
virtual flows) fTO]|. The constraints in ( fT4l i guarantee that all 
queues are stable. 

Optimization problem ( fT4] i is non-convex in general, and 
thus difficult to solve. For example, in the conflict graph 
model, the constraint set 11 is discrete and non-convex, while 
in the SINR-model, the capacity function C/j (p(h),h) is a 
non-concave function of p(h); see e.g., Il26l . 161. The next 
section analyzes the Lagrangian dual of ( fT4] l. 

III. Optimality of Layering 

This section shows that ( fT4l i has zero duality gap, and 
solves the dual problem via subgradient descent iterations. The 
purpose here is two-fold: i) to describe a layered architecture 
in which linear network coding is optimally integrated; and 
ii) to set the basis for a network implementation of the 
subgradient method, which will be developed in Section |IV| 



C. Optimal Resource Allocation 

A common objective of the network optimization problem 
is maximization of the exogenous rates a™ and minimization 
of the power consumption pi. Towards this end, consider 
increasing and concave utility functions Um{a"^) and convex 
cost functions Vi{pi) so that the overall objective function 
/(y) = J2m Um{a"^)-Y.i ^bi) is concave. For example, the 
utility function can be the logarithm of session rates and the 
cost function can be the squared average power consumption. 
The network utility maximization problem can be written as 



P = max V f/™(a™) - V V,{p,) 



(14a) 



s. t. a™ < E - E 



(14b) 



< 



E 



J:{l,.J)eA 

Jr\K^$ 



< CiJ 



cu < E 



E E pL(^ 

f J:iiJ)eA 



< 



\f K C N{i),m,te T™ 

(14c) 

V(i,J)e^ (14d) 
V(i,J)e^ (14e) 

P^ (14f) 



where i Q Af. Note that constraints ([U, (|9|l and ( fTTT i have been 
relaxed without increasing the objective function. For instance, 
the relaxation of ( fTTT i is equivalent to allowing each node to 
send at a higher rate than received, which amounts to adding 
virtual sources at all nodes i ^ t. However, adding virtual 
sources does not result in an increase in the objective function 
because the utilities [/,„ depend only on the multicast rate a"'. 

The solution of the optimization problem ( fT4] i gives the 
throughput a™ that is achievable using optimal virtual flow 
rates x™* and power allocation policies p(h). These virtual 
flow rates are used for network code design. When imple- 
menting coded networks in practice, the traffic is generated in 



A. Duality Properties 

Associate Lagrange multipliers ly™*, rf^, Xij and fii 
with the flow constraints ( |14b| i, the union of flow constraints 
(|14c| i, the link rate constraints (|14d| i, the capacity constraints 
(I14el i. and the power constraints (I14fb , respectively. Also, let 
C be the vector formed by stacking these Lagrange multipliers 
in the aforementioned order. Similarly, if inequalities ( I14bl i- 
(I14fb are rewritten with zeros on the right-hand side, the vector 
q(y,p(h)) collects all the terms on the left-hand side of the 
constraints. The Lagrangian can therefore be written as 

c{y, p(h), c) = J2 '^-(°'") - E ^'(^^■') - p(h))- 

(15) 



The dual function and the dual problem are, respectively, 
giC) --^ max C{y,p{h)X) 

(y,P(h))GS 



D = min p(C)- 
C>o 



(16) 
(17) 



Since (|14e| i may be a non-convex constraint, the duality gap 
is in general, non-zero; i.e., D > P. Thus, solving ( fTTI i yields 
an upper bound on the optimal value P of ( fT4] i. In the present 
formulation however, we have the following interesting result. 

Proposition 1. If the fading is continuous, then the duality 
gap is exactly zero, i.e., P = D. 

A generalized version of Proposition [T] including a formal 
definition of continuous fading, is provided in AppendixlAland 
connections to relevant results are made. The essential reason 
behind this strong duality is that the set of ergodic capacities 
resulting from all feasible power allocations is convex. 

The requirement of continuous fading channels is not 
limiting since it holds for all practical fading models, such 
as Rayleigh, Rice, or Nakagami-m. Recall though that the 
dual problem is always convex. The subgradient method has 
traditionally been used to approximately solve (fTTT i, and also 
provide an intuitive layering interpretation of the network 
optimization problem JSj- The zero duality gap result is 
remarkable in the sense that it renders this layering optimal. 

A corresponding result for unicast routing in uncoded net- 
works has been proved in ll22l . The fact that it holds for coded 
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networks with broadcast links, allows optimal integration of 
the network coding operations in the wireless protocol stack. 
The next subsection deals with this subject. 

B. Subgradient Algorithm and Layer Separability 

The dual problem ( fTTI l can in general be solved using the 
subgradient iterations 127] Sec. 8.2] indexed by t 

(y(£), p(h; ^)) e arg max C{y, p(h), CW) (18a) 

(y,p(h))eB 

+ 1) = [m + eq(yW, P(h; £))]+ (18b) 

where e is a positive constant stepsize, and [.]+ denotes 
projection onto the nonnegative orthant. The inclusion sym- 
bol (€) allows for potentially multiple maxima. In ( |18bt . 
q(y(£), p(h; ^)) is a subgradient of the dual function q{C,) in 
(fTST i at C(^)- Next, we discuss the operations in ( fTSl l in detail. 

For the Lagrangian obtained from ( fTSl l. the maximization 
in (I18ab can be separated into the following subproblems 



a™(£) e arg max 



z^j{l) G arg max 

0<zrr<2'"r'"' 



e arg max 



tgT" 



KcN{i) t£T" 



K<ZN(i) 

Cij{e) e arg max [£,ij{£) - Xij{£)] Cij 

0<Cij<c™ 

Pi{i) e arg max [iii{l)pi - Vi{pi)\ 
p(h; I) e arg max ^ 7/^(p(h), h, C) 



(19a) 

(19b) 

(19c) 

(19d) 
(19e) 
(19f) 



p(h)Gn 



/,(i,J)e^ 



where 



7f;(p(h), h, C) := A,,,C/^(p(h), h) - /i,pf^(h) (19g) 

and Ijf is the indicator function, which equals one if the 
expression X is true, and zero otherwise. 

The physical layer subproblem (|19fl implies per-fading state 
separability. Specifically, instead of optimizing over the class 
of power control policies, ( |19fl l allows solving for the optimal 
power allocation for each fading state; that is. 



P(p(h))= max E 
p(h)en 



J2 7.{7(P(h),h,C) 



E 



l^f^n E 7/j(p(h),h,C) 



(20) 



Note that problems ( |19at - (|19et are convex and admit ef- 
ficient solutions. The per-fading state power allocation sub- 
problem ( |19fl l however, may not necessarily be convex. For 



example, under the conflict graph model (cf. Example 1), the 
number of feasible power allocations may be exponential in 
the number of nodes. Finding an allocation that maximizes the 
objective function in ( [2Q] i is equivalent to the NP-hard max- 
imum weighted hyperarc matching problem |131. Similarly, 
the capacity function and hence the objective function for the 
SINR model (cf. Example 2) is non-convex in general, and 
may be difficult to optimize. 

This separable structure allows a useful layered interpre- 
tation of the problem. In particular, the transport layer sub- 
problem ( I19al i gives the optimal exogenous rates allowed into 
the network; the network flow sub-problem il9H yields the 
endogenous flow rates of coded packets on the hyperarcs; 
and the virtual flow sub-problem ( I19cb is responsible for 
determining the virtual flow rates between nodes and therefore 
the network code design. Likewise, the capacity sub-problem 
( |19d| i yields the link capacities and the power sub-problem 
( |19et provides the power control at the data link layer 

The layered architecture described so far also allows for 
optimal integration of network coding into the protocol stack. 
Specifically, the broadcast and virtual flows optimized respec- 
tively in (I19bb and ( I19cb . allow performing the combined 
routing-plus-network coding task at network layer An imple- 
mentation such as the one in (TOl typically requires queues for 
both broadcast as well as virtual flows to be maintained here. 

Next, the subgradient updates of ( I18bb become 





-1) 


= 




(21a) 




-1) 






(21b) 




-1) 


= [iu{e) - 




(21c) 




-1) 


= [A,:,/(^) - 




(21d) 




-1) 


= + 




(21e) 



where q{£) are the subgradients at index £ given by 

J2 E <W (22a) 

= E - E (22b) 



^C/^(p(h;^),h) 



E E pIjM 

f J:{i,J)£A 



(22c) 



(22d) 



(22e) 



The physical layer updates ( I21db and ( I21eb are again compli- 
cated since they involve the E[.] operations of ( I22db and (I22eb . 
These expectations can be acquired via Monte Carlo simula- 
tions by solving ( I19fb for realizations of h and averaging over 
them. These realizations can be independently drawn from the 
distribution of h, or they can be actual channel measurements. 
In fact, the latter is implemented in Section |IV] on the fly 
during network operation. 
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C. Convergence Results 

This subsection provides convergence results for the sub- 
gradient iterations ( fTSl l. Since the primal variables (y,p(h)) 
and the capacity function C/j(.) are bounded, it is possible 
to define an upper bound G on the subgradient norm; i.e., 
||q(y(£),p(h;£))|| < G for all £>!. 

Proposition 2. For the subgradient iterations in M9\ and ( 1211 ). 

the best dual value converges to D upto a constant; i.e., 

lim mill g{C{£)) < D + - — . (23) 

s— s-oo 1<1<S 2 

This result is well known for dual (hence, convex) prob- 
lems IZTl Prop. 8.2.3]. However, the presence of an infinite- 
dimensional variable p(h) is a subtlety here. A similar case is 
dealt with in 1221 and Proposition |2] follows from the results 
there. 

Note that in the subgradient method ( fTSl l. the sequence of 
primal iterates {y(^)} does not necessarily converge. However, 
a primal running average scheme can be used for finding the 
optimal primal variables y* as summarized next. Recall that 
/(y) denotes the objective function C^m(a")-I]j ^iiPi)- 

Proposition 3. For the running average of primal iterates 

1 



y(^) 



1=1 



(24) 



the following results hold: 

a) There exists a sequence {p(h;s)} such that 
(y(s), p(h; s)) G B, and also 



lim 



[q(y(s),p(h;s))]+ 



0. 



(25) 



b) The sequence /(y(s)) converges in the sense that 

liminf/(y(5))>P-^ (26a) 



and limsup /(y(s)) < P. 



(26b) 



Equation (|25] | asserts that the sequence {y(^)} together 
with an associated {p(h;£)} becomes asymptotically feasible. 
Moreover, ( |26] l explicates the asymptotic suboptimality as a 
function of the stepsize, and the bound on the subgradient 
norm. Proposition [3] however, does not provide a way to 
actually find {p(h;^)}. 

Averaging of the primal iterates is a well-appreciated 
method to obtain optimal primal solutions from dual subgra- 
dient methods in convex optimization ll25l . Note though that 
the primal problem at hand is non-convex in general. Results 
related to Proposition [3] are shown in ||23]| . Proposition [3] 
follows in this paper as a special case result for a more general 
algorithm allowing for asynchronous subgradients and suitable 
for online network control, elaborated next. 

IV. Subgradient Algorithm for Network Control 

The algorithm in Section IIII-BI finds the optimal operating 
point of ( fT4b in an offline fashion. In the present section, 
the subgradient method is adapted so that it can be used for 
resource allocation during network operation. 

The algorithm is motivated by Proposition [3] as follows. The 
exogenous arrival rates a'"(^) generated by the subgradient 



method [cf. (I19al )l can be used as the instantaneous rate of 
the traffic admitted at the transport layer at time I. Then, 
Proposition [3] guarantees that the long-term average transport 
layer rates will be optimal. Similar observations can be made 
for other rates in the network. 

More generally, an online algorithm with the following 
characteristics is desirable. 

« Time is divided in slots and each subgradient iteration 
takes one time slot. The channel is assumed to remain 
invariant per slot but is allowed to vary across slots. 

• Each layer maintains its set of dual variables, which are 
updated according to (l2Tl i with a constant stepsize e. 

• The instantaneous transmission and reception rates at the 
various layers are set equal to the primal iterates at that 
time slot, found using ( fT9] l. 

• Proposition [3] ensures that the long-term average rates are 
optimal. 

For network resource allocation problems such as those 
described in Q, the subgradient method naturally lends itself 
to an online algorithm with the aforementioned properties. 
This approach however cannot be directly extended to the 
present case because the dual updates ( |21d| )-( [2Te] i require an 
expectation operation, which needs prior knowledge of the 
exact channel distribution function for generation of indepen- 
dent realizations of h per time slot. Furthermore, although 
Proposition [3] guarantees the existence of a sequence of 
feasible power variables p(h; s), it is not clear if one could 
find them since the corresponding running averages do not 
necessarily converge. 

Towards adapting the subgradient method for network con- 
trol, recall that the subgradients q^^ and involve the 
following summands that require the expectation operations 
[cf. (I22dl ) and ( l22e] ll 



Cu{t) :=E 



P^{^) :=E 



^C/j(p(h;£),h) 



f.J:{l.J)eA 



(27) 



(28) 



These expectations can however be approximated by averag- 
ing over actual channel realizations. To do this, the power 
allocation subproblem ( |19fl ) must be solved repeatedly for a 
prescribed number of time slots, say 5", while using the same 
Lagrange multipliers. This would then allow approximation of 
the E operations in ( l27b and ( |28] | with averaging operations, 
performed over channel realizations at these time slots. 

It is evident however, that the averaging operation not only 
consumes S time slots but also that the resulting subgradient 
is always outdated. Specifically, if the current time slot is of 
the form £ = KS + 1 with K = 0,1,2,.. ., the most recent 
approximations of Cij and Pi available are 

Cuie~S) = - ^C^(p(h,;^-5),h,) (29a) 

K.=e~s f 

1 ^"^ 

m-s) = - J2 E p{jO^^;e-s). (29b) 
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Algorithm 1: Asynchronous Subgradient Algorithm 

1 Initialize = and C^,j{l) = P,(l) = 0. 

Let N be the maximum number of subgradient iterations. 

2 for £ = 7, 2, . . . , iV, do 
Calculate primal iterates a'"(^), a;fj*(£), 
ca(£), andp,(£) [cf. (fT9ali-(fT^1. 
Calculate the optimal power allocation p(h£; t(^)) by 
solving (I19fb using and ^(r(^)). 
Update dual iterates k™'(£ + 1), + 1) and 

+1) from the current primal iterates evaluated 
in Line 13 [cf. (l2Tali-(l2Tc]i1. 

6 if £ - t(£) = 5, then 

7 I Calculate Cu{t{C)) and PiCrl^)) as in 
end 

Update the dual iterates + 1) and + 1): 



10 



1) 



1) = 



A.j(£)+e(c,jW- 
/i.W + e(P,(r(£)) 



a,7(r(£))) 



11 end 



Network Control: Use the current iterates a™(£) for 
flow control; a;™*(£) and z™j{l) for routing and 
network coding; Cij{i) for link rate control; and 
p(hf;T(£)) for instantaneous power allocation. 



a) The sequence of dual iterates {C(^)} bounded; and 

b) 77ze best dual value converges to D up to a constant: 

lim min g{C{,£)) < D + — + 2eDGG. (32) 

s— >oo 1<£<S 2 

Thus, the suboptimality in the asynchronous subgradient 
over the synchronous version is bounded by a constant pro- 
portional to D = 25 — 1. Consequently, the asynchronous 
subgradient might need a smaller stepsize (and hence, more 
iterations) to reach a given distance from the optimal. 

The convergence of asynchronous subgradient methods for 
convex problems such as ( fTTj i has been studied in ll24l Sec. 6] 
for a diminishing stepsize. Proposition |4] provides a comple- 
mentary result for constant stepsizes. 

Again, as with the synchronous version, the primal running 
averages also converge to within a constant from the optimal 
value of flAi . This is stated formally in the next proposition. 

Proposition 5. If the maximum delay of the asynchronous 
counterparts of physical layer updates (I21dl l and (I21eb is D, 
then: 

a) There exists a sequence p(h; s) such that 
(y(s), p(h; s)) G B and 



lim 



[q(y(s),p(h;s))] + 



= 0. 



(33) 



b) The sequence /(y(s)) converges in the following sense: 



Here, the power allocations are calculated using (I19fb with 
the old multipliers Xij{£ — S) and — S). The presence 
of outdated subgradient summands motivates the use of an 
asynchronous subgradient method such as the one in |24l. 

Specifically, the dual updates still occur at every time slot 
but are allowed to use subgradients with outdated summands. 
Thus, Cij{£ — S) and Pi{£ — S) are used instead of the 
corresponding E[.] terms in (|22d| i and ( |22e| i at the current time 
£. Further, since the averaging operation consumes another S 
time slots, the same summands are also used for times £ + 1, 
£ + 2, ...,£ + S — 1. At time £ + S, power allocations from 
the time slots £, £ + 1, £ + S — 1 become available, and are 
used for calculating Cij{t) and Pi{l), which then serve as the 
more recent subgradient summands. Note that a subgradient 
summand such as Cij is at least S and at most 25 — 1 slots 
old. 

The asynchronous subgradient method is summarized as 
Algorithm [T] The algorithm uses the function t{£) which 
outputs the time of most recent averaging operation, that is, 

r(£) =max{5[(£-S'-l)/5J +1,1} V £ > 1. (31) 

Note that S < £ ~ t{£) < 25-1. Recall also that the 
subgradient components Cij and Pi are evaluated only at 
times t{£). 

The following proposition gives the dual convergence result 



on this algorithm. Define G as the bound 



[C^ P' 



TlT 



< 



G where C and P are formed by stacking the terms 



E 



EfC6(p(h),h) 



and E 



, respectively. 



Proposition 4. If the maximum delay of the asynchronous 
counterparts of physical layer updates (I21db and (I21eb is D, 
then: 



liminf /(y(s)) > P - 2eDGG (34a) 



and limsup /(y(s)) < P. 



(34b) 



Note that as with the synchronous subgradient, the primal 
running averages are still asymptotically feasible, but the 
bound on their suboptimality increases by a term proportional 
to the delay D in the physical layer updates. Of course, all 
the results in Propositions |4] and |5] reduce to the corresponding 
results in Propositions |2] and [3] on setting D = 0. Interest- 
ingly, there is no similar result for primal convergence in 
asynchronous subgradient methods even for convex problems. 

Finally, the following remarks on the online nature of the 
algorithm and the implementation of the Lagrangian maxi- 
mizations in iT% are in order. 

Remark 1. Algorithm 1 has several characteristics of an 
online adaptive algorithm. In particular, prior knowledge of the 
channel distribution is not needed in order to run the algorithm 
since the expectation operations are replaced by averaging over 
channel realizations on the fly. Likewise, running averages 
need not be evaluated; Proposition |5] ensures that the corre- 
sponding long-term averages will be near-optimal. Further, if 
at some time the network topology changes and the algorithm 
keeps running, it would be equivalent to restarting the entire 
algorithm with the current state as initialization. The algorithm 
is adaptive in this sense. 

Remark 2. Each of the maximization operations (|19ab -( fT9e] i 
is easy, because it involves a single variable, concave objective, 
box constraints, and locally available Lagrange multipliers. 
The power control subproblem ( |19fl l however may be hard 
and require centralized computation in order to obtain a 
(near-) optimal solution. For the conflict graph model, see 
lfT3l . Il28l and references therein for a list of approximate 
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x(m) 

Fig. 1. The wireless network used in the simulations. The edges indicate 
the neighborhood of each node. The thickness of the edges is proportional to 
the mean of the corresponding channel. 



TABLE I 
Simulation Parameters 



F 


2 




Exponential with mean h^^ = 0.1{dij /do)^^ 


hi 


for all G G and /, where do = 20m and 


dij is the distance between the nodes i and j; 
links are reciprocal, i.e., h^^ = h^^. 




Noise power, evaluated using dij = lOOra in the 
expression for hi above 




5 W/Hz for all / 


„max 
^ i 


5 W/Hz for all i e Af 


m 
"■max 


5 bps/Hz for all m 


m 
"mill 


10"'' bps/Hz for all m 


„max 


interference-free capacity obtained for each j E J via 


zJ 


waterfilhng under E ^J2f P^C^ij)] < P™"" ^"'^ all j e 




c^f /2 for all (i, J) e A 


^max 


zfJ''/2 for j e J andieN 


C/m(a™) 


\n{a"^) for all m 


ViiPi) 


lOp^ for alH e 



algorithms. For the SINR model, solutions of ( |19fl ) could be 
based on approximation techniques in power control for digital 
subscriber lines (DSL) — see e.g., E3l and references therein — 
and efficient message passing protocols as in ifTTl . 

V. Numerical Tests 

The asynchronous algorithm developed in Section |IV] is 
simulated on the wireless network shown in Fig. [T] The net- 
work has 8 nodes placed on a 300m x 300m area. Hyperarcs 
originating from node i are denoted by (i, J) £ A where 
J e 2^'*' \0 i.e., the power set of the neighbors of i excluding 
the empty set. For instance, hyperarcs originating from node 1 
are (1, {2}), (1, {8}) and (1, {2, 8}). The network supports the 
two multicast sessions Si ~ {1, {4, 6}} and 52 = {4, {1, 7}}. 
Table U lists the parameter values used in the simulation. 

The conflict graph model of Example 1 with secondary 
interference constraints is used. In order to solve the power 
control subproblem (|19fl l. we need to enumerate all possible 







1 








Pbcst(.5) . 


i 

















1 1 , , , , 1 

1000 2000 3000 4000 5000 



Fig. 2. Evolution of the utility function f{y{s)) and best dual value 
Pbcst(s) = minf<s £i(C{^)) for e = 0.15 and S = 50. 

sets of conflict free hyperarcs (cf. Example 1); these sets are 
called matchings. At each time slot, the aim is to find the 
matching that maximizes the objective function J^f {i j) llj- 
Note that since is a positive quantity, only maximal 
matchings, i.e., matchings with maximum possible cardinality, 
need to be considered. At each time slot, the following two 
steps are carried out. 

51) Find the optimal power allocation for each maximal 
matching. Note that the capacity of an active hyperarc 
is a function of the power allocation over that hyperarc 
alone [cf. (O and ©I. Thus, the maximization in ( |19f1 l 
can be solved separately for each hyperarc and tone. 
The resulting objective [cf. ( |19g| l] is a concave function 
in a single variable, admitting an easy waterfilling-type 
solution. 

52) Evaluate the objective function ( |19fl ) for each maximal 
matching and for powers found in Step 2, and choose 
the matching with the highest resulting objective value. 

It is well known that the enumeration of hyperarc matchings 
requires exponential complexity lfT3ll . Since the problem at 
hand is small, full enumeration is used. 

Fig. |2] shows the evolution of the utility function /(y(s)) 
and the best dual value up to the current iteration. The utility 
function is evaluated using the running average of the primal 
iterates [cf. (l24bl. It can be seen that after a certain number 
of iterations, the primal and dual values remain very close 
corroborating the vanishing duality gap. 

Fig. [3] shows the evolution of the utility function for 
different values of 5*. Again the utility function converges to a 
near-optimal value after sufficient number of iterations. Note 
however that the gap from the optimal dual value increases 
for large values of S, such as S* = 60 (cf. Proposition |5]l. 

Finally, Fig. |4] shows the optimal values of certain opti- 
mization variables. Specifically the two subplots show all the 
virtual flows to given sinks for each of the multicast sessions, 
namely, {s^ — l^t — 0} and {s^ ~ A,t — 7}, respectively. 
The thickness and the gray level of the edges is proportional 
to the magnitude of the virtual flows. It can be observed that 
most virtual flows are concentrated along the shorter paths 
between the source and the sink. Also, the radius of the circles 
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Fig. 3. Evolution of the utility function f{y{s)) for different values of S 
with stepsize e = 0.15. 



representing the nodes is proportional to the optimal average 
power consumption. It can be seen that the inner nodes 2, 4, 
6, and 8 consume more power than the outer ones, 1, 3, 5, 
and 7. This is because the inner nodes have more neighbors, 
and thus more opportunities to transmit. Moreover, the outer 
nodes are all close to their neighbors. 

VI. Conclusions 

This paper formulates a cross-layer optimization problem 
for multicast networks where nodes perform intra-session 
network coding, and operate over fading broadcast links. 
Zero duality gap is established, rendering layered architectures 
optimal. 

Leveraging this result, an adaptation of the subgradient 
method suitable for network control is also developed. The 
method is asynchronous, because the physical layer returns 
its contribution to the subgradient vector with delay. Using 
the subgradient vector, primal iterates in turn dictate routing, 
network coding, and resource allocation. It is established 
that network variables, such as the long-term average rates 
admitted into the network layer, converge to near-optimal 
values, and the suboptimality bound is provided explicitly as 
a function of the delay in the subgradient evaluation. 

Appendix A 

Strong Duality for the Networking Problem fT4[ 

This appendix formulates a general version of problem 
(fl4] l. and gives results about its duality gap. Let h be the 
random channel vector in := M!^, where R_|_ denotes the 
nonnegative reals, and rfh the dimensionality of h. Let V be 
the cr-field of Borel sets in il, and Ph the distribution of h, 
which is a probability measure on V. 

As in (fl4] i. consider two optimization variables: the vector 
y constrained to a subset By of the Euclidean space H^y; 
and the function p : f2 — > R'^p belonging to an appro- 
priate set of functions P. In the networking problem, the 
aforementioned function is the power allocation p(h), and 
set V consists of the power allocation functions satisfying 
instantaneous constraints, such as spectral mask or hyperarc 
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Fig. 4. Some of the optimal primal values after 5000 iterations with e = 0.15 
and S = 40. The gray level of the edges conesponds to values of virtual flows 
according to the color bar on the right, with units bps/Hz. 



scheduling constraints (cf. also Examples 1 and 2). Henceforth, 
the function variable will be denoted by p instead of p(h), 
for brevity. Let 11 be a subset of R'^p. Then V is defined as 
the set of functions taking values in 11. 

V := {p measurable | p(h) G 11 for almost all h G O}. (35) 

The network optimization problem (fT4l l can be written in 
the general form 

P= max /(y) (36a) 
subj. to g(y) + E[v(p(h), h)] < (36b) 
y G ^?y, p G (36c) 

where g and v are R'^-valued functions describing d con- 
straints. The formulation also subsumes similar problems in 
the unicast routing framework such as those in ll22l . ||23l . 

Evidently, problem ( fT4] i is a special case of ( |36] |. If inequal- 
ities (I14bl i-( ll4fb are rearranged to have zeros on the right hand 
side, function v(p(h),h) will simply have zeros in the en- 
tries that correspond to constraints ( |14bt -( fl"4d] i. The function 
q(y,p(h)) defined before ( fTST i equals g(y) + E[v(p(h), h)]. 

The following assumptions regarding (l36T l are made: 



IEEE/ACM TRANSACTIONS ON NETWORKING (ACCEPTED) 



10 



ASl. Constraint set By is convex, closed, bounded, and in the 
interior of the domains of functions /(y) and g(y). Set 11 is 
closed, bounded, and in the interior of the domain of function 
v(.,h) for all h. 

ASl. Function /(•) is concave, g(-) is convex, and v(p(h), h) 
is integrable whenever p is measurable. Furthermore, there is 
a G > such that ||E[v(p(h), h)] || < G, whenever peV. 

ASS. Random vector h is continuous^ and 

AS4. There exist y' e By and p' e V such that ( I36bl ) holds 
as strict inequality (Slater constraint qualification). 

Note that these assumptions are natural for the network 
optimization problem ( fT4] i. Specifically, By are the box con- 

, Cij, and pf, and 11 gives the 



straints for variables a" 

instantaneous power allocation constraints. The function /(y) 
is selected concave and g{y) is linear. Moreover, the entries 
of v(p(h), h) corresponding to ( I14fl ) are bounded because the 
set n is bounded. For the same reason, the ergodic capacities 
E[C/j(p(h),h)] are bounded. 

While ( |36] | is not convex in general, it is separable 1291 Sec. 
5.1.6]. The Lagrangian (keeping constraints ( I36cb implicit) and 
the dual function are, respectively [cf. also ( fTSl l and ( fT6] l1 

^y, P, C) = /(y) - C"^ (g(y) + E[v(p(h), h)]) (37) 

giC):^ max /:(y,p,C)= ^'(0+0(0- (38) 
where Q denotes the vector of Lagrange multipliers and 

^(C) := max {/(y) - C^g(y)} (39a) 

yGoy 

0(C) := maxC^E[v(p(h), h). (39b) 

pGP 

The additive form of the dual function is a consequence of 
the separable structure of the Lagrangian. Further, AS[T] and 
AS|2] ensure that the domain of g{C) is M'*. Finally, the dual 
problem becomes [cf. also (fTTI il 



D = mill piC)- 
C>o 



(40) 



As p varies in V, define the range of E[v(p(h), h)] as 

TZ:= {w e R'^ \w = E[v(p(h),h)] for some peV} . (41) 

The following lemma demonstrating the convexity of TZ plays 
a central role in establishing the zero duality gap of (|36] |, and in 
the recovery of primal variables from the subgradient method. 

Lemma 1. If A^A$3\ hold, then the set TZ is convex. 

The proof relies on Lyapunov's convexity theorem 1301 . 
Recently, an extension of Lyapunov's theorem l30l Extension 
1] has been applied to show zero duality gap of power control 
problems in DSL ||26l . This extension however does not apply 
here, as indicated in the ensuing proof. In a related contribution 
II22I . it is shown that the perturbation function of a problem 
similar to (|36] l is convex; the claim of Lemma [T] though is 
quite different. 

' Formally, this is equivalent to saying that is absolutely continuous with 
respect to the Lebesgue measure on . In more practical terms, it means 
that h has a probability density function without deltas. 



Proof of LemmaU] Let ri and r2 denote arbitrary points 
in TZ, and let a e (0, 1) be arbitrary. By the definition of TZ, 
there are functions pi and p2 in P such that 

ri = y v(pi(h),h)dPh andra = J v(p2(h), h)dPh. (42) 



Now define 



u{E) 



v(pi(h),h)dPh 

v(p2(h),h)dPh 



, E eV. (43) 



The set function u(£') is a nonatomic vector measure on 
V, because is nonatomic (cf. ASO and the functions 
v(pi(h), h) and v(p2(h), h) are integrable (cf. AS|2]i; see ||3T1 
for definitions. Hence, Lyapunov's theorem applies to u{E); 
see also lt30l Extension 1] and Ii22i Lemma 1]. 

Specifically, consider a null set $ in V, i.e., a set with 
-Ph(*) = 0, and the whole space ^2 £ 2?. It holds that 
u($) = and u(r2) = [r|^,r^]^. For the chosen a, 
Lyapunov's theorem asserts that there exists a set Ea G T> 
such that (E^ denotes the complement of Ea) 



u{Ea) = au{n) + (1 - a)u{^) = a 

u{E'^) = u(0) - u{Ea) = (1 - a) 
Now using these Ea and Ea, define 

Ip2(h), heE^^. 



(44a) 
(44b) 

(45) 



It is easy to show that PQ(h) G V. In particular, the function 
PQ,(h) can written as PQ,(h) = pi(h)l£;^ +p2(h)l£;c, where 
1^; is the indicator function of a set G V. Hence it is 
measurable, as sum of measurable functions. Moreover, we 
have that PaCh) G 11 for almost all h, because Pi(h) and 
P2(h) satisfy this property. The need to show PaCh) G P 
makes ll30l Extension 1] not directly applicable here. 
Thus, PQ(h) G "P and satisfies [cf. ( I4?l i1 



v(p„(h),h)dPh = / v(pi(h),h)dPh 



v(p2(h),h)dPh = ari + (1 -a)r2. (46) 



Therefore, ari + (1 — a)r2 E TZ. □ 
Finally, the zero duality gap result follows from Lemma [T] 
and is stated in the following proposition. 

Proposition 6. If A3iJ}A3S\ hold, then problem (l36T l has zero 
duality gap, i.e., P = D. Furthermore, the values P and D are 
finite, the dual problem (140b has an optimal solution, and the 
set of optimal solutions of (140b is bounded. 

Proof: Function /(y) is continuous on By since it is 
convex (cf. ASffland AS© 1271 Prop. 1.4.6]. This, combined 
with the compactness of By, shows that the optimal primal 
value P is finite. Consider the set 

W:={{wi,...,Wd,u)eR''+^ I f{y)<u, 
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g(y) + E[v(p(h), h)] < w for some y e By, p G . 

(47) 

Using Lemma [T] it is easy to verify that set W is convex. 
The rest of the proof follows that of [29 , Prop. 5.3.1 and 5. 1 .4], 
using the finiteness of P and Slater constraint qualification (cf. 
AS©. 

The boundedness of the optimal dual set is a standard result 
for convex problems under Slater constraint qualification and 
finiteness of optimal primal value; see e.g., 127., Prop. 6.4.3] 
and 1251 p. 1762]. The proof holds also in the present setup 
since P is finite, P = D, and AS|4] holds. □ 

Appendix B 
Dual and Primal Convergence Results 

This appendix formulates the synchronous and asyn- 
chronous subgradient methods for the generic problem ( [36] i; 
and establishes the convergence claims in Propositions |2]-|5] 
Note that Propositions |2] and |3] follow from Propositions |4] 
and [5] respectively, upon setting the delay D = 0. 

Starting from an arbitrary ^(1) > 0, the subgradient 
iterations for (gOll indexed by £ G IN are [cf. also (fTsT ll 

y{e) G arg max {/(y) - C^(^)g(y)} (48a) 

p(.; i) G arg max C^(^)IE[v(p(h), h)] (48b) 
pev 

and cii+i) = m) + ^m)+Hm^ m 

where g{£) and v(£) are the subgradients of functions ip{C) 
and (p{C), defined as [cf. also ( l22] il 

m ■■= g(yW) (49a) 
E[v(p(h;£),h)]. (49b) 

The iteration in (|48c| i is synchronous, because at every i, both 
maximizations ( 148 at and (|48bt are performed using the current 
Lagrange multiplier C(^)- An asynchronous method is also of 
interest and operates as follows. Here, the component v of the 
overall subgradient used at i does not necessarily correspond 
to the Lagrange multiplier C{^)^ but to the Lagrange multiplier 
at a time t{£) < £. Noting that the maximizer in (I48bb is 
p(.; t{£))) and the corresponding subgradient component used 
at £ is v{t{£)), the iteration takes the form 

+ 1) = im + (- m) + *(tW))]+ ,£eK. (50) 

The difference £ — t{£) is the delay with which the subgradient 
component ii becomes available. In Algorithm 1 for example, 
the delayed components are Cij{T{£)) and Pi{T{£)). 

Next, we proceed to analyze the convergence of ( fSOl l. 
Function g(y) is continuous on By because it is convex 1271 
Prop. 1.4.6]. Then ASH] and AS|2| imply that there exists a 
bound G such that for all y G ;By and p G T', 

|Ig(y)+E[v(p(h),h)]|| <G. (51) 

Due to this bound on the subgradient norm, algorithm (jSOl l 
can be viewed as a special case of an approximate subgradient 
method ||32| . We do not follow this line of analysis here 
though, because it does not take advantage of the source of 
the error in the subgradient — namely, that an old maximizer 
of the Lagrangian is used. Moreover, algorithm (ISOl l can be 



viewed as a particular case of an e-subgradient method (see 
Ii29l Sec. 6.3.2] for definitions). This connection is made in 
[i24l which only deals with diminishing stepsizes; here results 
are proved for constant stepsizes. The following assumption 
is adopted for the delay £ — t{£). 

ASS. There exists a finite £) G IN such that £ -t{£) < D for 
all £ G IN. 

A$5] holds for Algorithm 1 since the maximum delay there 
is D = 25* — 1. The following lemma collects the results 
needed for Propositions |2| and |4| Specifically, it characterizes 
the error term in the subgradient definition when -~v{t{£)) is 
used; and also relates successive iterates C{£) and C{£ + 1). 
The quantity G in the ensuing statement was defined in AS|2] 

Lemma 2. Under A5[7}A4I] the following hold for the se- 
quence {C{£)} generated by ^ for all 6 > 

a) -v'^iri£))ie-Ci£)) 

< (j){d) - (f>iC{£)) + 2eDGG (52a) 

b) -{g{£) + v{r{e))f{d-m) 

< q{0) - £>(C(^)) + 2eDGG (52b) 

c) \\c{£ + i)-ef-\\c{£)-e\\' 

< 2e [g{e) - g{C{£))] + e^G^ + Ae^DGG (52c) 

Parts (a) and (b) of Lemma |2] assert that the vectors 
—v{t{£)) and —g{£)~v{T{£)) are respectively e-subgradients 
of 0(C) and the dual function g{C) at C(^), with e = 2eDGG. 
Note that £ is a constant proportional to the delay D. 

Proof of Lemma |2|- a) The left-hand side of (15 2 at is 

-v'^im) (6 - m) = ~v^ir{£)) [e am)] 

-v^{r{£))[C{r{£))-C{£)]. (53) 

Applying the definition of the subgradient for (j>{C) at (^{t{£)) 
to it follows that 

-v^{r{£)){e-m)<m-mrm 

- v^{t{£)) [C{t{£)) - cm ■ (54) 

Now, adding and subtracting the same terms in the right- 
hand side of ( i54l i. we obtain 

-v^{T{£)){e-m)<m~mt)) 

+ [0(C(r(£) + ^))-0(C(r(€) + ^-l))] 

K = l 

1-t(1) 

- v^{r{£))[C{r{£)+^-l)-C{r{£) + K)]. (55) 

K = l 

Applying the definition of the subgradient for (/)(C) at C{t{£) + 
k) to ( i551 l. it follows that 

-v^irii)){9-m)<m-Hm) 

+ Y v'^{ri£)+K)[aT{£) + K-l)-C{Ti£)+K)] 

K = l 

- Y v'^iri£))[Ciri£)+K-l)-CiT{£) + K)]. (56) 

re=l 
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Using the Cauchy-Schwartz inqeuality, d56] l becomes 

\\ + Mrm\) 



£-t{£) 



K = l 



•||C(tW + ^c-i)-C(tW + «)||. (57) 

Now, write the subgradient iteration [cf. ( I50l l1 at t(^)+k— 1: 

am + ^) = [c(r(^) + « - 1) 



+ e(g(T(£) + ^ - 1) + -u(t(t(£) + ^ - 1)))] ' 



(58) 



Subtracting C('''(^) + ^ 1) from both sides of the latter 
and using the nonexpansive property of the projection 
Prop. 2.2.1] followed by (|5T]l, one finds from (|58]l that 



\\ar{i) + ^)-C{r{i) + ^-m 
< e ||g(T(£) +K-l)+ v{t{t{£) + k 



l))||<eG'. (59) 



Finally, recall that \\vii)\\ < G for all ^ e IN (cf. ASEJ, 
and i ~ r(£) < D for all £ G IN (cf. ASS. Applying the two 
aforementioned assumptions and ( |59] l to ( |57T i. we obtain ( |52a| ). 

b) This part follows readily from part a), using ( [38] l and the 
definition of the subgradient for V'(C) at ^(^) [cf. ( 149 al l 1. 

c) We have from dSOl) for all > that 



\ai+i)-0\\ 



e 



(60) 



Due to the nonexpansive property of the projection, it follows 
that 

m + 1) - df < \m + e im + virm - ef 



\m-o\r + e'\\m+vir 

+ 2eigie) + Hri£))f iCie)~0). (61) 



Introducing ( I52bb and dSTT i into dFTT l, ( I52cl ) follows. □ 
The main convergence results for the synchronous and 
asynchronous subgradient methods are given by Propositions 
|2]and|4] respectively. Using Lemma |2] Proposition |4] is proved 
next. 

Proof of Proposition |4} a) Let be an arbitrary dual 
solution. With g; and denoting the i-th entries of g and v, 
respectively, define 



S :-- 



min {-g,(y')-E[v,(p'(h),h)]} 

l<t<d 



(62) 



where y' and p' are the strictly feasible variables in AS|4] Note 
that 6 > due to AS|4] 

We show that the following relation holds for all £ > 1: 



||C(^)-ril<max{||C(l)-ril 



^(D-/(y')) + -^ 



eG 



(63) 



Eq. ( |63] | implies that the sequence of Lagrange multipliers 
{C(^)} is bounded, because the optimal dual set is bounded 
(cf. Proposition |6]l. Next, ( |63] ) is shown by induction. 

It obviously holds for £ = 1. Assume it holds for some 
f G IN. It is proved next that it holds for £+1. Two cases are 
considered, depending on the value of g{C{i))- 



Case 1: g{CW) > D + + 2eDGG. Then eq. ( |52cl ) 

with = a and g(C*) = D becomes 

iic(^ + i)-cf < iicw-rf 

- 2e [g{C{£) - D - (G^/2 - 2eDGG] . (64) 

The square-bracketed quantity in ( |64l i is positive due to the 
assumption of Case 1. Then ( l64l ) implies that \\C{£ + 1) — 
C1P < IK(^)-C*IP, and the desired relation holds for £+1. 

Case 2: g{C{£)) < D + eG^/2 + 2eDGG. It follows 
from ( |50b , the nonexpansive property of the projection, the 
triangle inequality, and the bound ( BTl ) that 

cii < \\m + <m+^{r{£))) 

<llCWII + IICII+eG 



m+i) 



(65a) 
(65b) 



Next, a bound on ||C(^)|| is developed. Specifically, it holds 
due to the definition of the dual function [cf. ( l38T l1 that 

g{m)= max {/(y) - C^(£)(g(y) + E[v(p(h), h)]) } 

> /(y') - C"^ W(g(y') + E[v(p'(h), h)]). (66) 



Rewriting the inner product in ( 166b using the entries of the 
corresponding vectors and substituting ( |62] | into ( l66b using 
C > 0, it follows that 



sY^m < -Ecrw(g^(y')+iE[v,(p'(h),h)]) 



1=1 



<g{m)-f{y')- 



(67) 



Using |1C(^)|1 < Ef=i C(^) into the following bound is 
obtained: 

\\m\\<\{Qm))-f{y'))- (68) 

Introducing ( |68] ) into ( |65bb and using the assumption of 
Case 2, the desired relation ( |63l ) holds for ^ + 1. 
b) Set e = C and ^?(6») = g{C) = D in (l52cl i: 



|C(€ + i)-C 



l'<IICW-Cf 



2e [D - g[Cm ■ 



Ae^DGG 



(69) 



Summing the latter for £ = 1 , . . . , s, and introducing the 
quantity mini<£<s g(C(^))> it follows that 

||C(^+l)-Cll' < WCi'i^) - cf + se^G^ + ise^DGG 

+ 2seD-2se min g(Ci£))- (70) 

i<e<s 

Substituting the left-hand side of ( iTOl i with 0, rearranging the 
resulting inequality, and dividing by 2es, we obtain 

2ej:>GG+ 1. " . (71) 



eG^ 
2 

note that lim„ 



min g(C(£))<D 

KKs 



2es 



Now, note that lims_i.oo miiii<^<s f?(C(^)) exists, be- 
cause mini<^<s ^?(C(^)) is monotone decreasing in s 
and lower-bounded by D, which is finite. Moreover, 
lims_s.oo IIC(l) ~ C* ir/(2es) = 0, because C* is bounded. 
Thus, taking the limit as s — !> oo in ( fTTT i. yields ( |32] |. □ 
Note that the sequence of Lagrange multipliers in the 
synchronous algorithm ( I48cb is bounded. This was shown for 
convex primal problems in [25 , Lemma 3]. Interestingly, the 
proof also applies in the present case since A^T]-A$4]hold and 
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imply finite optimal P = D. Furthermore, Proposition |2] for the 
synchronous method follows from ||27l Prop. 8.2.3], ll22l . 

Next, the convergence of primal variables through running 
averages is considered. The following lemma collects the 
intermediate results for the averaged sequence {y(s)} [cf. 
(|24]|1. and is used to establish convergence for the generic 
problem ( |36] ) with asynchronous subgradient updates as in 
dSOl l. Note that y(s) e By, s > 1, because (l24l i represents 
a convex combination of the points {y(l), . . . , y(s)}. 

Lemma 3. Under Ai[7}Ai|5] with denoting an optimal 
Lagrange multiplier vector, there exists a sequence {p(.;s)} 
in V such that for any s g IN, it holds that 



a) [g(y(s)) + 

b) /(y(.s)) > D 

c) /(y(s)) < D 



E[v(p(h;s),h)]] + 
2 

g(y(s)) - 



< 



IIC(.s + i) 



IIC(i)l 

2e.s 



2eDGG 
E[v(p(h; s),h)] 



(72a) 
(72b) 

(72c) 



Eq. ( |72a| ) is an upper bound on the constraint violation, 
while ( |72b| i and ( |72c| ) provide lower and upper bounds on the 
objective function at y(s). Lemma [3] relies on Lemma [T] and 
the fact that the averaged sequence {y(s)} is generated from 
maximizers of the Lagrangian {y(^)} that are not outdated. 
Proof of Lemma |5} a) It follows from dSOl l that 

C(^+l) >CW + e(9W + f(rW)). (73) 
s, using ^(1) > 0, and dividing 



Summing ( |73] | over I ~ 1, 
by 2es, it follows that 



1) 



(74) 



1=1 



Now, recall the definitions of the subgradients g{t) and 
v{T{t)) in ( |49] l. Due to the convexity of g(-), it holds that 

g(y(s))<-Eg(yW) = -E9W- (75) 

^ e=i ^ e=i 

Due to Lemma [T] there exists p(h; s) in V such that 

E[v(p(h; s),h)]^-j2 E[v(p(h; r(£)), h)] = 1 ^ * (r^). 



£=1 



Combining (|74l), (|75ll, and (|76l), it follows that 

g(y(,s))+E[v(p(h;s),h)] < ^^^±^ 



(76) 



(77) 



Using ^(.s + 1) > and the fact that [.]+ is a nonnegative 
vector, ( I72al i follows easily from (ITTT i. 

b) Due to the concavity of /(•), it holds that /(y(s)) > 
- J2t=i /(y(^))- Adding and subtracting the same terms, 
C^(^)g(yW) and C^(^)E[v(p(h; r(£)), h)] for £ = 1, . . . , s, 
to the right-hand side of the latter, and using f{y{t)) ~ 
C^(f)g(y(£)) = i^iCii)) [cf. (ESili and (Ol, it follows that 



/(y(s)) > i^[^(CW) -C''WE[v(p(h;r(^)),h)] 



+ jE^''W(g(yW)+lE[^(P(h;^W)>h)]). (78) 

£=1 

Now recall that E[v(p(h; t(£)), h)] = i3(T(£)) [cf. (|49b] l1. 
Thus, it holds that 

-C^(£)E[v(p(h;r(€)),h)] 

= -C^(rW)i}(r(f)) + v^{T{i)) [C{t{£)) ~ CW] • (79) 

The first term in the right-hand side of ( |79] l is (/)(^(t(£))) ([cf. 
(I48bl i and (l38Tl1. The second term can be lower-bounded using 
Lemma |2ja) with 6 ~ C{t{£)). Then, ( |79] l becomes 

- WE[v(p(h; t(£)), h)] > <^(CW) - 2e7^GG (80) 



Using dSOll into ^ and V(C(^)) + 0(C(^)) = ^(CW) > D, 
it follows that 

/(y(s)) > D - 2eDGG 

+ jE^^w(g(yW)+EWp(h;^w)'h)])- (81) 

e=i 

Moreover, it follows from (ISOl l and the nonexpansive prop- 
erty of the projection that 

IIC(^ + i)f <IICWf 

+ 2eeii) (g(yW) + E[v(p(h; t{£)), h)]) 



+ e2||g(y(£))+E[v(p(h;^),h)] 



(82) 



Summing (|82] | for £ = 1, . . . , s, dividing by 2es, and intro- 
ducing the bound (ISTT i on the subgradient norm yield 

i^C^W(g(y(f))+E[v(p(h;€),h)]) 

eG^ , ||C(s + l)f -||C(1)|'2 



£=1 



> 



+ 



(83) 



2 2es 

Using ( |83] ) into dSTT i together with ||C(s+l)|p > 0, one arrives 
readily at ( |72b] i. 

c) Let (^* be an optimal dual solution. It holds that 

/(y(s)) = /(y(s)) - C*^(g(y(.s)) + E[v(p(h; h)]) 

+ C*^(g(y(s)) + IE[v(p(h; s), h]) (84) 

where p(h; s) was defined in part (a) [cf. (|76]|1. 

By the definitions of D and C* [cf- (I40li1, and the dual 
function [cf. (|38]l], it holds that 

D = ^?(C)= max £(y,p,r)>/:(y,P,r). (85) 
yeoy.peP 



Substituting the latter into ( I84b . it follows that 

/(y(s)) < D + C^(g(y(s)) + E[v(p(h; ,s), h)]) . (86) 
Because C* > and < [0]+ for all 6, §6^ impHes that 

(87) 

Applying the Cauchy-Schwartz inequality to the latter, ( I72cl i 
follows readily. □ 
Using Lemma |3] the main convergence results for the 
synchronous and asynchronous subgradient methods are given 
correspondingly by Propositions [3] and |5] after substituting 



/(y(s)) < D + [g(y(s)) + E[v(p(h; s), h)] 



q(y(s), p(h; s)) - g(y(.s)) + E[v(p(h; s)] 



(88) 
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Proof of Proposition |5} a) Take limits on both sides 
of illai as s — s> oo, and use the boundedness of {C(s)}- 

b) Using P = D and taking the liminf in ( |72b| i. we 
obtain (|34a| i. Moreover, using P = D, ( |72at . the boundedness 
of ||C*||, and taking limsup in illci . (I34bb follows. □ 
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